{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Copyright 2018 MD.ai, Inc.   \n",
    "Licensed under the Apache License, Version 2.0*\n",
    "\n",
    "# Create additional annotations using the MD.ai Annotator "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Pneumonia Detection Challenge: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge        \n",
    "\n",
    "For further data exploration and to create your own additional annotations, clone the MD.ai project at:  \n",
    "https://public.md.ai/annotator/project/LxR6zdR2.  \n",
    "\n",
    "You can then create your own team, add new labels and additional annotations. The “Users“ tab will allow you to create teams and assign exams to team members. You can track progress and export your new annotations in JSON format.\n",
    "\n",
    "Further instructions and videos are available at https://docs.md.ai."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "z0i1JLTF8_Pu"
   },
   "source": [
    "### Clone the Annotator Project\n",
    "\n",
    "RSNA Pneumonia Detection Challenge Annotator project URL:  \n",
    "https://public.md.ai/annotator/project/LxR6zdR2/workspace. \n",
    "\n",
    "To add annotations to the cloned project, you need to clone the project first. \n",
    "\n",
    "First, navigate to the original project URL (above), click on \"Clone Project\" button. \n",
    "\n",
    "![GitHub Logo](https://raw.githubusercontent.com/mdai/ml-lessons/master/images/kaggle-project-clone-small.gif)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FdivV_kX8_PN"
   },
   "source": [
    "**Intro to deep learning for medical imaging lessons**\n",
    "\n",
    "- Lesson 1. Classification of chest vs. adominal X-rays using TensorFlow/Keras [Github](https://github.com/mdai/ml-lessons/blob/master/lesson1-xray-images-classification.ipynb) [Annotator](https://public.md.ai/annotator/project/PVq9raBJ)\n",
    "\n",
    "- Lesson 2. Lung X-Rays Semantic Segmentation using UNets. [Github](https://github.com/mdai/ml-lessons/blob/master/lesson2-lung-xrays-segmentation.ipynb)\n",
    "[Annotator](https://public.md.ai/annotator/project/aGq4k6NW/workspace) \n",
    "\n",
    "- Lesson 3. RSNA Pneumonia detection using Kaggle data format [Github](https://github.com/mdai/ml-lessons/blob/master/lesson3-rsna-pneumonia-detection-kaggle.ipynb) [Annotator](https://public.md.ai/annotator/project/LxR6zdR2/workspace) \n",
    "  \n",
    "- Lesson 3. RSNA Pneumonia detection using MD.ai python client library [Github](https://github.com/mdai/ml-lessons/blob/master/lesson3-rsna-pneumonia-detection-mdai-client-lib.ipynb) [Annotator](https://public.md.ai/annotator/project/LxR6zdR2/workspace)\n",
    "\n",
    "- MD.ai python client libray URL: https://github.com/mdai/mdai-client-py\n",
    "- MD.ai documentation URL: https://docs.md.ai/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lVBBH_EF8_PU"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import json\n",
    "import pydicom\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0a1bLOhX8_PW"
   },
   "source": [
    "### Import the `mdai` library\n",
    "\n",
    "Run the block below to install the `mdai` client library into your python environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "id": "cYvDgipk8_PX",
    "outputId": "28c7fa59-b481-4b40-dfe1-3e2f49cbb9a5"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.0.5'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "!pip install --upgrade --quiet mdai\n",
    "import mdai\n",
    "mdai.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "WZQjgjnK8_Pb"
   },
   "outputs": [],
   "source": [
    "# Root directory of the project \n",
    "ROOT_DIR = os.path.abspath('./lesson3-data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RrVjmqtJ8_Po"
   },
   "source": [
    "### Create an `mdai` client\n",
    "\n",
    "The mdai client requires an access token, which authenticates you as the user. To create a new token or select an existing token, navigate to the \"Personal Access Tokens\" tab on your user settings page at the specified MD.ai domain (e.g., public.md.ai).\n",
    "\n",
    "**Important: keep your access tokens safe. Do not ever share your tokens.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "id": "XB28YDKw8_Pq",
    "outputId": "68eed3fe-121f-4966-f53d-bd734025e262"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Successfully authenticated to public.md.ai.\n"
     ]
    }
   ],
   "source": [
    "mdai_client = mdai.Client(domain='public.md.ai', access_token=\"MY_PERSONAL_ACCESS_TOKEN\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Zl509Hs38_Pu"
   },
   "source": [
    "### Define project\n",
    "\n",
    "Define a project you have access to by passing in the project id. The project id can be found in the URL in the following format: `https://public.md.ai/annotator/project/{project_id}`.\n",
    "\n",
    "For example, `project_id` would be `XXXX` for `https://public.md.ai/annotator/project/XXXX`.\n",
    "\n",
    "Specify optional `path` as the data directory (if left blank, will default to current working directory)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "uXUv3t_98_Pv"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using path '/home/txia/mdai-git/ml-lessons/lesson3-data' for data.\n",
      "Preparing annotations export for project EoBKoMBG...                                                \n",
      "Preparing images export for project EoBKoMBG...                                                     \n",
      "Using cached images data for project EoBKoMBG.\n",
      "Using cached annotations data for project EoBKoMBG.\n"
     ]
    }
   ],
   "source": [
    "# use cloned project_id! \n",
    "CLONED_PROJECT_ID = 'EoBKoMBG' \n",
    "p = mdai_client.project(project_id=CLONED_PROJECT_ID, path=ROOT_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PXiRWnxX8_Pz"
   },
   "source": [
    "## Prepare data\n",
    "\n",
    "### Grab the label ids. You'll need these to create a label dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 107
    },
    "colab_type": "code",
    "id": "EDecTsz28_P0",
    "outputId": "aeec82e1-7a0b-477f-fd37-6ec1a5a7b4df"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Label Group, Id: G_q563m2, Name: Default group\n",
      "\tLabels:\n",
      "\tId: L_NBy1aB, Name: Lung Opacity\n",
      "\tId: L_Wdjx2B, Name: No Lung Opacity\n",
      "\n"
     ]
    }
   ],
   "source": [
    "p.show_label_groups()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gB22GyWU8_P4"
   },
   "source": [
    "### Set label ids\n",
    "\n",
    "Selected label ids must be explicitly set by `Project#set_label_ids` method in order to prepare datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Note: Your label ids and dataset ids will be different. Use show_label_groups() and show_datasets() to find your specific ids."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 71
    },
    "colab_type": "code",
    "id": "ChuEtYTz8_P4",
    "outputId": "67326c39-cc58-414b-9189-f9337e24a9f2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'L_Wdjx2B': 0, 'L_NBy1aB': 1}\n",
      "None\n",
      "bbox\n"
     ]
    }
   ],
   "source": [
    "# this maps label ids to class ids\n",
    "# make sure this matches the Kaggle dataset\n",
    "# target = 0: No Lung Opacity \n",
    "# target = 1: Lung Opacity \n",
    "labels_dict = {\n",
    "    'L_Wdjx2B':0, # target = 0, background \n",
    "    'L_NBy1aB':1, # target = 1, lung opacity \n",
    "              }\n",
    "\n",
    "print(labels_dict)\n",
    "p.set_labels_dict(labels_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use this formula to find your specific dataset id and use it to load dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 71
    },
    "colab_type": "code",
    "id": "dFJRPTL18_P8",
    "outputId": "a198f18a-ab68-4eda-d28e-af888206d35e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Datasets:\n",
      "Id: D_gEX5do, Name: stage 1 train\n",
      "\n"
     ]
    }
   ],
   "source": [
    "p.show_datasets()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1K-f6JdP8_QB"
   },
   "outputs": [],
   "source": [
    "dataset = p.get_dataset_by_id('D_gEX5do')\n",
    "dataset.prepare()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 53
    },
    "colab_type": "code",
    "id": "xBIVY6IL8_QE",
    "outputId": "833066ff-317a-4c9d-f38b-1e7c9269b16d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Label id: L_Wdjx2B, Class id: 0, Class text: No Lung Opacity\n",
      "Label id: L_NBy1aB, Class id: 1, Class text: Lung Opacity\n"
     ]
    }
   ],
   "source": [
    "dataset.show_classes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# generate kaggle labels format (see, stage_1_train_labels.csv)\n",
    "\n",
    "# use dataset object from above \n",
    "image_ids = dataset.get_image_ids() \n",
    "\n",
    "kaggle_data = []\n",
    "for image_id in image_ids: \n",
    "    ds = pydicom.dcmread(image_id)    \n",
    "    anns = dataset.get_annotations_by_image_id(image_id)\n",
    "    for ann in anns: \n",
    "        labelId = ann['labelId']\n",
    "        target = int(dataset.label_id_to_class_id(labelId))\n",
    "\n",
    "        if target == 0: \n",
    "            kaggle_data.append((ds.PatientID, None, None, None, None, target))\n",
    "\n",
    "        elif target == 1: \n",
    "\n",
    "            x = ann['data']['x']\n",
    "            y = ann['data']['y']\n",
    "            height = ann['data']['height']\n",
    "            width = ann['data']['width']\n",
    "            kaggle_data.append((ds.PatientID, x, y, height, width, target))\n",
    "        else: \n",
    "            raise ValueError('Target {}  is invalid.'.format(target))\n",
    "            \n",
    "kaggle_df = pd.DataFrame(kaggle_data, columns=['patientId', 'x', 'y', 'width', 'height', 'Target'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "kaggle_df.to_csv('stage_1_train_cloned.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>patientId</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>00322d4d-1c29-4943-afc9-b6754be640eb</td>\n",
       "      <td>111.59540</td>\n",
       "      <td>92.98391</td>\n",
       "      <td>634.40922</td>\n",
       "      <td>497.87585</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>003d8fa0-6bf1-40ed-b54c-ac657f8495c5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>00313ee0-9eaa-42f4-b0ab-c148ed3241cd</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0004cfab-14fd-4e49-80ba-63a80b6bddd6</td>\n",
       "      <td>175.58974</td>\n",
       "      <td>281.97101</td>\n",
       "      <td>429.23523</td>\n",
       "      <td>286.53734</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>00436515-870c-4b36-a041-de91049b9ab4</td>\n",
       "      <td>350.52874</td>\n",
       "      <td>609.69195</td>\n",
       "      <td>117.70117</td>\n",
       "      <td>133.00229</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              patientId          x          y      width  \\\n",
       "0  00322d4d-1c29-4943-afc9-b6754be640eb  111.59540   92.98391  634.40922   \n",
       "1  003d8fa0-6bf1-40ed-b54c-ac657f8495c5        NaN        NaN        NaN   \n",
       "2  00313ee0-9eaa-42f4-b0ab-c148ed3241cd        NaN        NaN        NaN   \n",
       "3  0004cfab-14fd-4e49-80ba-63a80b6bddd6  175.58974  281.97101  429.23523   \n",
       "4  00436515-870c-4b36-a041-de91049b9ab4  350.52874  609.69195  117.70117   \n",
       "\n",
       "      height  Target  \n",
       "0  497.87585       1  \n",
       "1        NaN       0  \n",
       "2        NaN       0  \n",
       "3  286.53734       1  \n",
       "4  133.00229       1  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kaggle_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",patientId,x,y,width,height,Target\r\n",
      "0,00322d4d-1c29-4943-afc9-b6754be640eb,111.5954,92.98391,634.40922,497.87585,1\r\n",
      "1,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,,,,,0\r\n",
      "2,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,,,,,0\r\n",
      "3,0004cfab-14fd-4e49-80ba-63a80b6bddd6,175.58974,281.97101,429.23523,286.53734,1\r\n",
      "4,00436515-870c-4b36-a041-de91049b9ab4,350.52874,609.69195,117.70117,133.00229,1\r\n",
      "5,00436515-870c-4b36-a041-de91049b9ab4,177.50805,229.51724,249.52643,209.50804,1\r\n",
      "6,00436515-870c-4b36-a041-de91049b9ab4,645.95862,349.57241,149.48047,105.93103,1\r\n"
     ]
    }
   ],
   "source": [
    "!cat stage_1_train_cloned.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**At this point, you could merge the training labels in this csv with the original training labels csv (i.e., stage_1_train_labels.csv); either using the command line, or read the csv data via pandas and merge the data.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "name": "lesson3-rsna-pneumonia-detection-mdai-client-lib.ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
